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ABSTRACT 

Presented is a technical report concerning the use of 
a oathematical model describing certain aspects of the duplication 
and selection processes in natural genetic adaptation. This 
reproductive plan/model occurs in artificial genetics {the use of 
ideas from genetics to develop general problem solving techniques for 
computers) • The reproductive plan is a sequential stochastic process 
involving n-tuples (corresponding to chromosomes in genetics) vhich 
nay be simple numeric constants or complex structures such as 
computer algorithms. The plan also involves a sequence of probability 
distributions defined over n-tuples^ The report consists of five 
chapters: introduction; reproductive plans; deterministic problem 
bases; a chapter divided into sections on the search for an arena, 
the linear additive model, the linear models and pure problem bases; 
and conclusions. An appendix illustrating the theorem involved and a 
list of references conclude the report. (PEB) 
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I'ONVl-RCliNiT. PUOPF.RTIKS OF A CLASS OF PRORABIl.lSTlt: 
ADAPTlVi- SCHHMliS CALUiD SHQUtiNTIAL RliPROlHJCTIVH PLANS 

by 

Nancy Martin 
Chairman: John H. Holland 

A reproductive plan is a mathematical model describing certain 
aspects of the duplication and selection processes in natural genetic 
adaptation. These models occur in artificial genetics, which is the 
use of ideas from genetics to develop general problem solving techniques 
for computers. 

A reproductive plan is a sequential stochastic process involving 
n-tuples which correspond to chromosomes i .enetics. The individual 
elements of the n-tuples, which correspond to ..enes, may be simple 
numeric constants or may be such complex structures as computer algo- 
rithms. The plan also involves a sequence of probability distributions 
defined over the n-tuples. 

At each step of the stochastic process, one of the n-tuples is 
selected using the current probability distribution. The "value" of 
the selected n-tuple is then obtained from an external function or 
subroutine. This value is then used to define a new probability distri- 
bution for the next step of the process. 

A particular reproductive plan is said to converge if the distri- 
butions developed at each step converge to a distribution which selects 
the most valuable n-tuple. We analyse the convergence properties of 
several subclasses of reproductive plans. We show that in a suitably 
restricted problem domain one subclass, SRI pi nns, converges. .-jlso 
show that the convergence is not fast enough to achieve finite loss. 



Hy fciat iujj; reproiiuw t i vo plans to a 
psychology, linear additive models, 
roproducrti vc plans do not converge. 



class of models used in ntat iuMiat ica 
we show that several subclas^;es of 
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INTRODUCTION 



Tlu' Loyjic of Coniputer< Group at The University of Midugan has 
been studying methods of applyinR the techniques of natural genetic 
adaptation to develop adaptive techniques for problem solvini' with 
computers. Wc win use the term artifiaial jenetias to refer to this 
process. We give here a very brief description of the genetic approach 
to problem solving in order to demonstrate the origins of the adaptive 
procedures we have investigated. For a thorough introduction we 
recommend Holland [to he i.uhlishod]. Chapter 1 of MoHstien (1971) 
relates the artificial jzenetic approach to adaptive control processes. 
Chapter 1 of Cavicchio (1970) relates the artificial genetic approach 
to pattern recognition and problems in artificial intelligence. 

We separate the world of the adaptive process into two p.irts: 
the adaptive algorithm and the environment with which it must interact. 
If an algorithm is tc be adaptive then something internal to the 
structure of the algorithm must chajige as time progresses. The first 
problem is to find an adequate method of representing that which will 
change. A? in natural genetics, artificial genetics assumes that 
there is a set , of '^chromosomes". Each chromosome or string 

is an n-tuple, (aj,...,a ) where the a. cun he simple constants 
or quite complex structures such as the instructions for a computer 
subroutine. Tlie a. are referred to as «enes. iiollstien (1971) has 
investigated the use of Gray codes (also known as reflected codes J 
and Hash codes to make the representation of information in the n- tuple 
more i-f f i ci ent . 



1 
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m.n vi^w tho probieia of adartution as u t ransfomat um, p.obicm: 
a set of strin.,s^^ obtain now sot of <trin.v:s ^ ^ in anplvin.. 
a sot of oporators hasod on somo evaluation of tho strings \uj!/ , 
Wc assume that oxtcnial to the adaptive process there is an ovaluaMon 
function that evaluates the elements of jig^ and that tho results of 
this evaluation are available to the genetic process. In an artificial 
ijenotic adaptive scheme the operators are modeled after the natural 
genetic operators: duplication, crossover, inversion, mutation and 
domi najice. 

In the approach taken by Holland (to be published}, the ]n-oces- 
is divided into two stages. The duplication operator is applied to 

to provide a new set J^'^ containing multiple copies of some of 
the strinjjs of J^^. The number of copies made of a particular string 
depends on the evaluation of that string. So duplication is a corying 
process that does not alter the individual strings, in the second 
state of the process, the other operators are applied to the i^ot 
to form a now set of strinj:s. Then the evaluation process is used to 
reduce the size of this set to form tJii 

t+r 

The crossover operator is a function which takes two strings 
as arguments and creates two new strings by interchanging genes of one 
string with the corresponding genes of the other string. !-or example 



if 



A = , . . , ,a^) and 



B = (bj,...,b^) 

then the result of a simple crossover operator might be the strings 
( , , , , ,a, ,1). ^ ^ , , . , ,h^) and 
(bj . . ,b. ,aj . . ,a^j . 



n\e roj^ult of a 'Moubic'* crossover operator might be 

^^ N-'iM V- 

llao inversion operator is a function of a single string which reverses 
the order of some segment of the string. For exantplc, the result 
of applying an inversion operator to string A above might be the string 
(aj , . . . , a. ,aj^^ J , . , . ,a. ,a^^ J , , , . ,a^) . The mutation operator makes 
random citanges in the genes. The dominance operator is only applicable 
when the chromosomes have a more specific structure. Essentially, 
it chooses which copy of a gene will be effective if there is more than 
one representation for the same gene in the string. 

The mutation, crossover and inversion operators can also be applied 
at the gene level to change the structure of the individual a. 's. 
A review of some of the algebraic aspects of these operators is presented 
in Foo and Bosworth (1972). 

Experimental work with artificial genetic adaptation has been 
carried out by flollstien (1971), Cavicchio (1970), Bosworth, Foo and 
Zeigler (1972) and Dan Frants (1972). Holland began the theoretical 
investigation by analyzing the duplication phase of the process in 
(1969) and (1970). The present work is limited to this first phase 
of the adaptive scheme. 

In order to capture the notion of duplication theoretically, a 
new algorithm called a Reproductive Plan was developed which did not 
use the other operators. It was intended that this algorithm could 
act as a driver program for the total genetic scheme. 

A Reproductive Plan is a sequential stochastic process. IT we 
view adaptation as a decision theory problem, thon wc must decide at 
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each time step which elements of e^^to choose. The problem 1-; to find 
a suitable process to maximize the sum of all the outcomes or to find 
a process that eventually only chooses the "best" element of tj^, 
according to some measure of "best". One method of making this aetision 
is to have a probability vector over the space tJ?/ which changes with 
experience. There are many methods, both linear and nonlinear for 
changing the values of the probability vector according to past perfor- 
mances. Many of these methods have been explored using different 
terminology by mathematical psychologists in searching for a model of 
behavior. While we are not interested in modeling any actual observed 
behavior, we are interested in taking advantage of the analysis that 
has been made and extending it to our particular requirements. 

Shapiro and Narendra (1969) have compared the performance of 
several mathematical psychology models in the problem of function 
optimization with noise. Norman (1970) has done extensive work in 
analyzing a model which we show is very close to the reproductive plans 
of Holland. 

Non-probabilistic methods of choosing among the elements of «J?/ 
have also been studied. A special problem of the type we are considering 
i.s called the n-armed bandic problem. This problem is generally stated 
as follows: wc are given n coins with unknown probabilities p ,...,p 
of heads. At each time step wo are to choose which coin to toss. 
Thu objective is to find a sequential decision procedure that maximizes 
the limiting proportion of heads. Robbins (1952) was one of the first 
statisticians to examine the problem of sequential testing. He developed 
a successful rule for the case n = 3 which used the sample moan of the 
previous t tosses to choose the coin for the (t+l)st toss. The rule 
included a provision that prevented a coin from only being tested a 



finite number of times, later Robbins (1956) restaicd the problem 
allowing only a fixed number of previous tosses to be used in the 
decision making process. This problem is referred to as the bandit 
problem with finite memory. The exact interpretation of the term 
finite memory has been discussed in the literature an'' related to 
automata theory. For example see Cover (1969) and Hellman and Coccr 
C1970, 1971). 

In the present work we develop the conditions for convergence 
of reproductive plans and relate this to the models of mathematical 
psychology and statistical decision theory. 



CllAPTUR 2 
RHPRODUCTIVE PLANS 



In this chapter we define a class of algorithms for adaptation 
similar to those developed by Holland (1970). This class of algorithms 
treats adaptation as a decision theory problem. There is a set of 
possible strategies and a probability vector over the strategies. 
There is also an evaluation function which measures the "worth" of a 
strategy at a particular time* A procedure sequentially chooses a 
strategy according to the probability vector. The procedure does 
not have direct access to the evaluation function but receives the 
resulting value of the function applied to the strategy choosen. 
This value is used to update the probability vector for the next choice. 
The object is to have the probability of the "best*' strategy in S 
approach one as the number of trials increases* If a procedure is 
such that the probability of choosing any particular strategy in S 
goes to one with the number of trials, we say the procedure converges 
to that strategy. We now give definitions to make these notions 
precise. 

Definition 2,1: <A,u,S> is a pr^oblem basis where 

A is a set of ])ossible or admissible structures, 

is a function that assigns to each structure in A a random 
variable. We restrict the choice to the set of finite real 
random variables whose moment generating functions exist, 
S is a set of strategies for choosing structures from the space 
A, A strategy in S is a method for determining a trajectory 
or set of trajectories through the space A. 
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I Aanifilo 2.2: In a :on>~sum, two person iiume, ict the structures of A bo 

the puiv strategics for player one, jAj = M, 

M 

1 = UPi....,Pj^,)iO $ p. < 1. .r. p. = U, let f:S > T be 1-1. onto ;md 

*^ 1=1 ^ 

for s t: S. s chooses structure a. of A with probability p^ where p^ is 
the ith component of the probability vector f(s). Note that in this 
example the word "strategy" is being used in two senses: first in its 
usual game theoretic sense and second to refer to elements of S which are 
"strategies for choosing strategies". For a e A. the random variable y(a) 
would be the "payofT' function of a. If we assume that player two always 
uses the same mixed strategy, then u(a) is a single random variable as 
in Definition 2.1 and <A,y,S> is a problem basis. However, if player 
two changes his strategy with time y(a) no longer satisfies Definition 2.1. 

Definition 2.3: A problem basis <A.u.S> is deterrrdnietiQ if for each 
a ^ A, u(a) is a finite constant (hence its variance is zero), and the 
strategies s e S are not probabilistic. 

lixamplc 2.4: If the structures in the set A are vectors in n-space. 
then any one of the standard function maximization algorithms of numerical 
analysis would be a nonprobabilistic strategy for determining a new 
structure of A given one or more "previous" structures and their function 
vaJues u(a^). A collection of such algorithms could be the set S in a 
deterministic problem basis . 

Definition 2.5: A strategy s c S is a pure strategy if s always selects 
the same element of A. 

.M.'finition 2.6: A problem basis '■A.u,S> is pure if S is a set of pure 
strategies and the variances of the random variables u(a). a l A, are 
nonzero, 

ERIC 
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Tho problem of maximizing a real valued function Jofined on a finite 
set of points where there is random noise in the function evaluation 
can he considerr^d a pure problem basis. Here A ^ ( > . • . ^i^^^ ' is a set 
of n points on the real line, y(aj^), i ^ l,...,n, is uniformly distributed 
with mean p. in the interval [0.-2,0^-^2] and S is the- set of pure strategies 
such that there is one and only one strategy in S for each clement of 
A. This problem was discussed by Shapiro and Narendra (1969). 

The n-armed bandit problem is an example of a pure problem basis 
with A the set of n arms. 



1 with probability 

0 with probability 1-p^ 



Again, S is the set of pure strategies such that there is one and only 
one strategy in S for each element of A. 

Definition 2.7: A sequential adaptive e^aheme^ SAS, over the problem 
basis <A,u,S> is an algorithm for choosing 

1. An initial starting structure of A denoted A^ 

2. At each time step, t, a strategy from S to be ased to 
obtain a new structure. 

Definition 2.8: A probabiliatia sequential adaptive scheme, PSAS, is 
an SAS with the strategy at time t chosen according to a probability 
distribution over S. 

Definition 2.9: A I'SAS over a problem basis -A,i;,S> is said to converge 

to the set liCS if lim IV ^ - I where P is the probability of 

t- sell 

choosing strate^^y s at time t. 
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We will use the following notation: 
t time variable. 

N'tt) the number of distinct strategies an SAS has selected prior to 
time t. 

s. the jth distinct strategy selected by an bAS. 

^it) the set of distinct strategies selected prior to time t. 

set) = (Sj,S2,...,s^^^^}. 
tj the time strategy s. is first selected. 

A^(k) the structure that would result if strategy s were being 

selected by an SAS for the kth time. 
^s,t probability of selecting strategy s at time t. 

^0,t probability of selecting a strategy from S-S(t) at time t. 

w(s,t) the nuirfjer of times strategy s has Deen selected up to and 

including time t. 

^s,t ^ abbreviation for uCAg(wCs,t))) , the random variable resulting 
from the choice of structure made by strategy s at time t. 

^S(t) ^^^^ probability distribution by which a strategy may be selected 
from S-SCt) at time t. P is an initial probability distribution 
over S, 

!S| cardinality of t],c bet 5. 

the mean of the distribution u(a) for a c A. 
u* is the lub where the lub is over the set {aj for some strategy 

■3 K S, A^Ct) = a}. 

t ^•'^ t strategy s is actually selected at time t and u*, 
>»t 5,t w(q,t) 

if strategy s is not selected but strategy q is selected at 
time t. 
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In all of the ahovo notation, If the strategy referred to is in 
the set set"), wc often use only the subscript. For exajnple wo abbreviate 
U) to A.(k), w(s.,t) to w(j,t), and u to u. . 

We now have enough notation to define a general sequential repro- 
ductive plan. There are two calculations which are external to the 
procedure. The first is the calculation which determines, given the 
strategy selected by the plan, which structure in A is choosen. In a 
pure problem basis this calculation always results in the same structure 
for a given strategy. The second is the calculation of the "payoff" 
of the structure. These calculations are represented by the functions 
STRUCTURE and PAYOFF in step 2 of the procedure. 

The exact method of selecting new strategies from S-S(t) is not 

specified in the SRP procedure. We put the following restrictions on 

the function FIND of step 2 of the procedure. The original probability 

distribution over S must be modified to bo a distribution V^, ^ over 
V S(tj 

S-S(t). We will assume that for \S\ finite, if ^0i») > 0 for a particular 
strategy s and if .s SCt), then Pg^^-^Cs) > ^^i^^- |s! infinite, if 

P^(W) > 0 for a particular subset WCS and if WHSCt) = 0, then 

Our first procedure is a general form which will be altered in 
subsequent definitions by changing the variables v and and by altering 
the calculation in step 3.10* 

Definition 2.10: SRP, a sequential reproduative playi^ is a PSAS over 
a problem basis -'Aj^jS- where: 

!• The values assumed by the random variables are in an interval 
(r^^^r^) where t^,t^ are finite real numbers. 
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The procedure requires the following functions: 

2.1 SELECTCPj^^^^ J) which assigns a value to j according to 
the probability vector ^^^^y That is, if 

**N(t) * ^^0'Pl*""^N(t)^ ^ assigned the value 
i with probability p^. 

2.2 FIND(P^,S(t),s^) which chooses a new strategy from S-S(t) 
according to the restrictionj above and labels it Sy 

2.3 STRUCTURE(s^.A^(t).AQ) which applies strategy s^ to 
obtain a new structure (t) . 

2.4 PAYOFF (A^(t),v,v*) which uses A^ (t) to obtain values for 
V and V*. 

The operation of the procedure is as follows: 

3.1 Choose 0 > 0, kj > rj+1, ^2 > 0, P^. 

3.2 Choose at random an initial structure A^. 

3.3 Set N(l) « 0, t = 0, SCt) « 0. 

3.4 Set t = t+1. 

3.5 calculate Pj^^^^ . (P^^^ P^^^^^^) by 

3.S.1 

[N(t)*l3"^^*®^ if N(t)+1 < |S| 
0 otherwise 
3.5.2 If N(t) > 0 for 1 < i < N(t) calculate: 



P« 
0,t 



Prod. . , 

^i,t = ^^-^0,t) N(t) ' 



h«l ^'®*^,t-l 



3.6 SELECT(Pj^^^j,j) 



'.■^ l<" i is 0 thoJi Set N(t*i) " N(.t)*l» j - N(.t + n, 



J 

Prod. , , - r. , 

otherwise set N(t*l) « N(t), S(t+1) * S(t). 

3.8 STRUCTURE(s. ,A.(t),AQ). 

3.9 PAYOFFCA^.(t),v,v*). 

3.10 Calculate: for 1 s i ^ N(t+1) 
3.10.1 



V for i « j 
V* for i j 



3.10.2 k2 
Prod.^^ = Prod.^^^jM^-kj) . 

3.11 GOTO step 3.4. 

Definition 2.11: An SRla is an SRP where the value returned by PAYOFF 

for V is u. and the value for v* is u* . An SRlb is an SRP where 

the value returned by PAYOFF for v is y. and the value for v* is 

3 »t 

"^'^^''J(j,t)''^j,t^- An SRI is an SRla or an SRlb. 

The definition of an SRI differs in several respects from that of 
Holland (1970), We have allowed the u(a) to be random variables. In the 
original development u was a single valued function from A to a finite 
subset of the reals (rj,r2). This restriction does not allow the appli- 
cation of the algorit^ini to such problems as the n-armed bandit probiera. 
In the Holland paper the notion of a set of environments S was includ'»d. 
However, since all of the theoretical work was done with respect to a 
fixed element E t. S , this aspect of the plan has been discarded. The 
notion of environment could easily be incorporated in the functions u 
since they are external to the algorithm, or in the structural description 
of the elements of A. The calculation of the probabilities P. in the 
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SRI algorithm can be expressed as: 

t-1 k 
liquation 2.12: P.^^ = (i-P^^^) i 



h«l »^»^ t««tj^ h.t* 1 
This is similar to the calculation in the 1970 paper except for the upper 
limit of the products in the numerator and denominator. In the 1970 
paper this limit is t, in equation 2.12 the limit is t-1. We cannot use 
M.^^ until it has been calculated and in our algorithm this calculation 

comes in step 3.10 and the calculation of P. in step 3.5. 

1 ,t 

The values of the P^^^ determine how often a new strategy is to be 

tried. One of the weaknesses of this algorithm is that ^ is not 

o» t 

dependent on the performance of the strategies that have already been 
used. It would be preferable to have P.. be relatively large if the 
strategies used so far were not performing well, small if they were doing 
well. However, this definition does insure that with probability 1, 
when |S| is finite all of the strategies will be tried by some < «. 
When ls| is infinite, there is no finite time after which no new plans 
arc tried. 

We observe from equation 2.12 that for an SRI, if <A,w,S> is 
deterministic, P^ ^ cannot increase as a result of using strategy j, 
except of course, the first time it is used. Also, if the jth strategy 
is used at time t, u.^^ = i'J(j^t)' strategy j has been used before, 
the probabilities P^^^ do not change. The purpose of using such a method 
to change the probability vector over S(t) is to allow the vector to 
be more responsive to the performance of the strategy over time. We 
wish to avoid converging to false strategies due to their early behavior. 
However, it is exactly this property which makes it impossible for the 
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algorithm to have all of the convergence properties claimed in Uolhmd 

Lot us now consider a modified form of Fqu.ition \\^v aw sui:j 

procedure for finite S. We know that after some time T^., eacn of rhe 

strategies will have been used at least once. We also note tnat the 

P. ^ and P. ^ in the numerator and denominator of Equation 2.12 are 
i.t. h,t^^ 

constant factors that do not influence the convergence properties of the 

P. as t ». In fact, they reinforce the bias of the original distri~ 

bution ?0 over S. If this distribution is not a good measure of the 

value of the strategies, the P. in the formula could slow down 

J,t^ 

convergence but never prevent it. If we neglect these factors, and let 



3»t 



.i,t 1 



+k, ) - we have: 



Equation 2.13: for t : T^*l 



t-1 




r n u. 

h«l n«tjj 



Let i be the strategy actually used at time t and iT^ » (u, 
Then Equation 2.13 can be expressed as: 



w(i,t) 1 




Equation 2.14: for t > T 



f*l 



if 1 = J : 





( 
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Uquation 2.13 has a familiar form. With different interpretations of 
the u, this Is the form of the Beta Model of mathematical psychology. 
Tlus moUel has been studied extensively by Luce (1959), Laaperti and 
Suppes (I9b0), Lampcrti (1960), Norman (1970) and others. Although 
their terminology and interpretations of the model differ fro?n those of 
Holland, the mathematical analysis applies to both. The Holland work 
claims that under certain restrictions of the problem basis <A,m,S> 
the model not only converges but converges in such a special way that 
"bad" strategies are used only a finite number of times. However, the 
studies by the mathematical psychologists show that the Beta model 
(without Holland's restrictions) does not always converge. 

We will show that under restrictions similar to Holland's the SRI 
plan does converge, out not as strongly as he claimed. Then we will 
examine convergence under other restrictions and relate our results 
to those of Norman (1970). 



CHAPTER 3 
DETERMINISTIC PROBLEM BASES 



This chapter examines the convergence properties of the SRI algorithm 
over deterministic problem bases* 

Definition 3,1: A strategy in a deterministic problem basis ir> -^ptlmul 
iff 3T a w(As(t)) n for t > T. 

Definition 3.2: A strategy s in a deterministic problem basis is 

(wynptotiaatVj optinal iff I [m* - y(Aa(t))] < «. 

t«l ^ 

If s is optimal, s is asymptotically optimal. 

Definition 3.3: An arena is a deterministic problem basis <A,;i,S> such 

that there exists a set HCS, P„(H) > 0 and gib u(As(t)) g u* - 

° scH ^ ^ 

where Z X « Nj, < ». 
t=l ^ ^ 

It is not hard to show that <A»y»S> is an arena if and only if S 
contains asymptotically optimal strategics with nonzero initial probability. 

Definition 3.4: Sg, the arena set for the arena <A,p,S>, is the set of 
all asymptotically optimal strategies. 

To avoid trivial situations we henceforth assinne is a proper 
subset of S. At this point it would appear that if our problem basis 
is an arena then the optimum algorithm in searching S would converge 
to the arena set. Because of the lack of limitation on the sequence 

this is not necessarily the case. As presently defined, is the 
best that any single strategy could do if used t times. It is not 
necessarily the optimum value that any algorithm could achieve at time 
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t. It is possible that a PSAS which selects an element of S accorUing 
to a fixed vector (Pj ,P2, . . . .Pg) with p. j< 0 for at least ono i not in 
the arena set could have a higher payoff than a similar PSAS with « 0 
for all strategies not in the arena set. We give an example where this 
is the case. 

Example 3.5: Let strategy 1 have payoff 100/k the kth time it is used, 
strategy 2 have payoff 99/k the kth time it is used. Then v* is 100/t 
and we have an arena set consisting of strategy 1. However, a PSAS that 
uses strategy 1 all of the time will not have as good a payoff as one that 
uses an appropriate mix of the strategies 1 and 2. Let p be the probability 
of choosing strategy 1, 1-p the probability of choosing strategy 2, where 
p is fixed for all t and 1 > p 2 1-p. Then if M is the number of choices 
of strategy 1 in U trials and N > M > .OIN, the payoff would be 

M N-M N 

I 100/k ♦ Z 99/k > 100/k. 
K"! k»l k«l 

For p > 1-p, the expected payotx of the mixed strategy is greater 
than the payoff of the arena set. 

To have strategies in a deterministic problem basis have the payoffs 
indicated, both the set A and the strategies would be quite unnatural. 
However, such problem bases are not eliminated by the definition of an 
arena. Consequently we introduce the following notion. 

Definition 3.6: An arena <A,u,S* is a reetHoted arena if there exists 
a rime T such that no PSAS over <A,y,S> can obtain a payoff at time 
t • T higher than 

Thi* rest of this section develops the results necessary to show 
that in fact the losses associated with an SRI algorithm even over a 
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restricted arena cannot be finite. 

Definition 3.7: The absolute loss of strategy s is 

s t«l ^ 

If a strategy s is in the arena set for <A,y,S> then ALg is finite, 
otherwise ALg is infinite. 

Definition 3,8: The aatual loss of the strategy s during any particular 
use of an algorithm is 

^,t K(i,t) - ^s,t> 

where s^ is the strategy selected at time t. The aaaumlated actual loss 

is R ZD 
s s,t 

According to Definition 3.8, a strategy s has zero loss at time 
t if it is not selected at time t. Therefore when we speak of the actual 
loss incurred by a particular strategy, we only mean the loss resulting 
from the actual selection of that strategy by an algorithm. 

Lemma 3.9: Let be the arena set for an SRI over a restricted arena 
<A,p,S>. If s e Sg, then the accumulated actual loss is finite. 
If s' ^ and s» is, with probability 1, selected infinitely often, 
then the accumulated actual loss, D^,, is infinite with probability 1. 
Proof: The first part of the lemma follows directly frj the definitions 
of an arena set and of asymptotically optimal strategies. For the second 
part observe that: 

where in, } is the infinite set of indices where strategy s' is actually 
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selected. But w(s»,nj^) « k so iS^,^^ « u (A^, Cw(s« .nj^) )) « yCAg.Ck)) 

^. ' j^- ^f'i; - »aA^,(k))) = AL^, « - for i Sj,. 

The definition of actual loss incorporates jnany of the ideas cxpx*essed 
in Holland (1970). A strategy is to, be measured against how well other 
strategies would have done if they had been used the same nuniber of ti:.Jes. 
However, there is no guarantee that the actual loss directly represents 
the loss of using a strategy under SRI compared with some optimal scheme. 

Definition 3.10: The loss incurred by an SRI algorithm over a finite 
modified arena is 

D « I £ Cp! - J. 
t=l s«l ^ »»t 

Lemma 3.11: The loss of an SRI algorithm over a finite restricted arena 
is infinite if the total accumulated actual loss over all strategies is 

|s! 

infinite. That is D = ^« if r D„ = <». 

s=l * 

Proof: There exists T > for t > T, s e S, by Definition 3.6 
M* 1-*^ where s^ is used at time t. 

'\ " h,t = ^wci,t) - ^,t 

<X) on 

The left hand side is D-K, the right hand side is Z D„-K' where 

seS ^ 

K and are finite constants. Therefore 

D-K > i; D -K' = «> 
seS s 

■md 

D s 
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I'rom lomma 3.9 the accumulateU actual loss is not finite it* •^ome 
stratcjiy s* not in S^. is used infinitely often. We now explore the 
conditions imUer which this happens. 

Definition 3.12: H is a limiting set of a PSAS over the arena <A,m,S> 
if HCS and there is some time T such that for t > T, with probability 
1, only strategies from the set H are selected for use. 

If S is finite and there are no proper subsets, HCS, such that 
H is a limiting set and P^CH) < 1, then each of the strategies in S 
with nonzero initial probability must be used infinitely often. 

We want to develop conditions under which a limiting set does 
not exist. Let Q be a PSAS over <A,y,S>, HCS and {X^^^ a sequence 
of random variables defined by 

1 if a strategy in H is selected by Q at time i 
0 otherwise. 

We can now represent the choices of strategies of Q as an infinite 
vector of the X^. The space of all 0-1 valued infinite vectors is 
then the underlying space representing all possible sequences of 
choices by Q. The probability distribution over these vectors is 

dependent on the vectors P^^^^ = ^PQ^t'^'l,t** * * '''N(t) t^* ^" ^^"^^^ 

have a limiting set, we are interested in those vectors of the X. 

1 

which from some point on have entries of 1 only. Let 

B = { (X- ,X_, . . . ,X , . . . ) i X . ~ 1 for all i > m} 

in J i ut J •' « 

and 

B = lim B . 

m 

If the probability of the set B is 0, then W is not a limiting set for 



X. = 
1 
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Q. Let 



- '^h Vi^- 

Then AP^ is the incrAment in the probability of choosing an element 
of the set H given that an element in H was chosen at the previous 
time step, and irrespective of the other past choices. Intuitively, 
if AP^ is negative, one would not expect H to be a limiting set. 
This is in fact the case. 

Theorem 3.13: Let Q be a PSAS over <A,y,S>. HCS and X. , B . and B 

1 in 

defined as above relative to H and Q. -If there is a time T such that, 
for m : T, AP^ s 0, and if P(X^ « 1) < 1 for all finite m then PCB) 0 
and H is not a limiting set for Q, 
Proof: for m > T, 

■"(V = Pf"! = 1. -» 5 i 5 n|Xj X„.j) 



Since AP < 0. 



P(B^) < Urn [P(X^= l|X.....,X^,j)f-'°-l 

= 0 



So, for arbitrary m • T, P(B ) = 0. 

m 



But B = lim B^ and the B^ are a sequence of increasing events. 

llierefore, P(B) = lira P(B )[see 1.3.1 Neveu (1965)] so P(B) « 0. 
Since P(B) is 0, with probability 1 H is not a limiting set of Q. 



Lemma 3.14: Let Q be an SRI algorithm over a restricted arena <A,p,S> 
with finite S. Let H be the arena set of S, n ^ S andP^(H) ^ 1. Then 
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(al P(X « n V 1 and (b) AP s 0 for m - T.. 

Proof: It' PU^ ' n = 1 then, lottiuii P^(in denote the 

probability of choosing an element of H at t, P^(H) = 1. T}\erefore 
P^(S-H) « 0. 

Ip If one of the strategies in S-H, say s^, has been us«*d at 
least once, P^CS-H) > P.^^. But Pj^t ^ S(t)+1 ^ ^ 
since the u. > 1. Since strategy s. has been used once 
prior to t, Cj^^^^^j !^ 1 by definition. Therefore P^(S-H) j« 0. 
2. If none of the strategies in S-H have been used then 

P^CS-H) = pQ^^. Pg^^j(S-H). But Ps(t)(S-H) > P^CS-H) > 0. 
P0 ^ = 0 only when all strategies have been used. This would 
imply that H = S and our assumption is that H ^ S. 
Therefore P^(S-H) 0 and P(X^ = 1) < 1. 

(b) Let T^ be the time by which with probability 1 each strategy 
in S has been used at least once. If strategy s^ e H is used at time 
ra > T^ + 1 then for j ^ i from Equation 2.14 we have, 

1 



P4 = P* .^•'^1 where 6, = 
j,ra*l j,ra 1 



P. , = P. where <5- 

i,ra+l i,ra 2 2 



"1,™ \ "l.m/ 



since >1, 0<6-<l, 



P < P P > P 

Let hP. = P. ,-P. , AP. = P. ,-P. 

j ,m j,m+l j,ra i,m i,m+i 

TJien AP. > 0, LP. < 0 

3 ,m = * J ,m = 
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Therefore, for arbitrary e H used at time m 
m s . en j ,ni < 0 

With probability 1, is finite and therefore, AP < 0 fox all 
m > T^, finite. 



Corollary 3.15: The arena set Sg of a finite arena <A,y,S> cannot be 
a limiting set for an SRI. In fact, no HCS such that P^CH) < 1 
can be a limiting set and each strategy in S with nonzero initial 
probability P^(s) will be used infinitely often by the SRI algorithm. 

We now show that the loss incurred by an SRI algorithm over a 
finite restricted deterministic arena is infinite. 

Theorem 3.16: With probability 1, the loss incurred by an SRI 
algorithm over a finite restricted arena <A,m,S> is not finite. That 
is with probability 1, 



s=i ^ 



and therefore 



Proof: By Corollary 3.15, the arena set S^ of <A,m,S> is not a limiting 

set. merefore with probability 1, at least one strategy s» <^ S 

is used infinitely often. Therefore Lemma 3.9 shows that for s', with 



probability 1, 



4V 



since D * * (y*^. *.^ - *) > 0 for all s,t,i 



s s 

E D « !: S ^ > D . ^ ^ 0- 



By lemma 3.11, I D « D s= «> • 

s=l 



We have shown now that an SRI over a restricted deterministic 
arena will, with probability i, not achieve finite losses. However, 
the theorem in Holland (1970) is interpreted as showing that a repro- 
ductive plan would, with probability 1, achieve finite losses. Let 
us define the "expected" loss of the algorithm to be: 

3 17 EL = u* - ^(^^/v 

^ ^ k«l ^k,t k,t 0,t ^S-^S(t) 

where X^_s(t) expected payoff resulting from strategies in 

S-SCt) and the P, . are defined as in 2. 10, Now Holland's result 

in our notation states: 

Proposition 3.18: (Holland) An SRI algorithm over a restricted arena 

<A,M,S> will, with probability 1, satisfy the following criteria: 
T 

(1) lim S EL < if '|si»is finite; 
l-x" t=J 

T T T 

(2) lira ( E M* - I EL )/ I = 1 otheru'ise. 
T-^ t«l ^ t=l ^ t«l 

This proposition would appear to conflict with our results. However, 

this expression for expected loss, EL^ is less than the loss the 

algorithm would actually receive at any time t. That is for jsj 



2S 



fiitite and t > 
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" 't " j^^i'k.t^.t = - '"j,t^''j,t strategy s. used 

at time t. 



Therefore, showing that ? EL is finite in no way inrolies that any 

t=l ^ 

actual use of the algorithm wiU achieve finite losses, and EL^ is 
not an adequate expression for the expected loss of the algorithm. 

We could change the definition of an SRI algorithm so that the 
strategies in S were used in a more parallel fashicm. That is, the 
strategy chosen at time t actually would calculate the element of A 
it would calculate if it were being used for the t th time. If we 
define 



uCA^Ct)) If s. is used at time t 



( 



otherwise 



^k,t ^^P^^sents the evaluation a strategy would receive if it had 
actually been used t times, whether it had or not. This would be 
possible if the strategies in S did not use the "feedback" evaluation, 
•^CAg(t-l)), of the element of A selected at time t-1 to decide which 
element of A to select at time t. 

Definition 3.19: An SR2 is an SRP where the function PAYOFF returns 
the vaxiiQ u(A.(t)) for v and u*. 

The expected loss defined analogous to 3,17, 

N(t) 

3.20 EL* = u* -• 2 y. P, n T 

t t j^^j k,t k,t - ^^X S-S(t) 

is still less than the actual loss on any particular use of the 
algorithm. However, we cannot guarantee that the loss would be 
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infinite, liven thouijh each plan will bo useU infinitely often, it is 
possible that for some s' / S^., (k^> the infinite set of times at which 
s' is used. 



If this happens for all strategies in S, then the loss will be finite. 
We see immediately, however, that the loss is not finite if even one 
strategy in S has an evaluation strictly bounded away from yj. 

We finish this chapter by demonstrating that under the restric- 
tions we have been imposing, an SRI algorithm will converge, with 
probability 1 , to a subset of the arena set of S. 

Theorem 3,21: An SRI over a finite, restricted deterministic arena 
<A,M,S> converges with probability 1 to a set H€S|:, wh«t? is the 
arena set of S, That is, with probability 1, 



Proof: By Corollary 3. IS, every strategy in S with a nonzero initial 
probability must be used infinitely often by the SRI algorithm. 
Consider the products 




lim 



Z P. 



s.eH 



t-i 

j,t-l n^t. 




if ? / j, the numerator and the denominator arc equal. If 




M, arc the times at which 2, = j ; that is, the times at 



.^7 



which stratejjv .t is actually usoJ, Then w(j,ra^) « ^ and 
»"en lim c. . . = lim n Vt =• . 



^(A.(q))*k u*-u(A (q)) 

Now, 1—. i. „ 1 . q . i J 

fors. .S^ < I Iu*-u(A.Cq)))] <«> 

q-i q 1 q^l ^ J 



for s. . -i^^ , ^J^I.'-.CA^Cq))] - ». 

We now recall the following basic result [Theorem 7, page 96 of Knopp 

(1956)]: 

A product of the forra n(l-ay), with 0 < Oy < 1, is convergent 

V ^ X 

(to a nonzero value) if, and only if, r a^ converges. 

vol 

Applying this we find that 

M u(A (q))+k, 

(a) for s. t S Um e . . , . = lim n K \ ■ ... A » jyj > 0 

^ ^ t^^ -■''^-1 Nf^ q=:l ^^-i^ e.j 



M uCA (q))+k. 
J t— M--. q«l 



(H) for s / S Um ■• ^ , « lim Jl "- -^ 'i. - z = 0 



''O'^j n=l ' 'j,t-l 

Now lim P, = lim ^ 

t -' •''^ t- Is I . 

•^1 ^'^> ni=l ^ ^'^-1 
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Hie Ucnorainator is always nonzero since we are in an arena and Sp is not 
empty. 



from (b) if s ; / S,. Hm P. , = 0 



0»t^ e,j no! tr* 

from (a) if s. t lim P. ^ « 2 "IXL 



^ ^ t-^ ^ 1 

us n«l IT* 

E n 



With probability 1, the times t^ are finite for those I t HCS^ such 
that > 0 and therefore Urn P. j« 0. 



Therefore, lim T. P. n i. 
t-^ jcS .^»^ 



Now we have shown that an SRI algorithm converges to a set of good 
strategies in a very restricted problem basis. However, the proof of 
Theorem 3.20 does not generalize to a problem basis that is not an 
arena, Tliis is obvious because if <A,y,S> is not an areiia, we cannot 
guarantee the convergence for any of the c. . The proof also assumes 
that li* > u(A.Ct)). This assumption is not always justified in a 
nondetcnninistic problem basis. 



CllAPTliR A 
PURE PROBLEM BASES 



Section I: The Search for an Arena 

Let us now consider a pure problem basis, <A,y,S>, where S is a 

set of pure strategies and the variances of the random variables 

iu(a), a c A are non-zero* We suppose there is a one-to-one, onto mapping 

between A and S* We will use p for p , the mean of the function wCa)* 

s a 

We will denote uCa) by u . We note that » lub p is not dependent 

seS 

on the time variable since we are dealing with pure strategies in a 
nonvariable problem basis. 

Let us propose some definitions for as>Tnptotically optimal strategies 
in this basis ♦ If we can find such definitions we can then define an 
arena and an arena set over a pure problem basis similar to Definitions 
3.3 and 3,4 for a deterministic problem basis. If not, we must find 
other ways to explore the convergence of SR type algorithms over pure 
problem bases. The two most obvious candidates for such a definition 
are: 

Ao A strategy s c S in a pure problem basis <A,iJ,S> is asymptotically 

n , 
optimal if Mm t -u^ ^) j < a.s. 

B. A strategy s S in a pure problem basis <A,u.S> is 

n ^ 

asymptotically optimal if jlim C^^*^,. ^ 

n*^^ t=l ^ 

where v* « max (w 

t £. s,t 
scS ^ 

If we accept either of these definitions, there arc no asymptotically 
optimal strategic*; in a pure problem basis* We prove this 
statement in the following theorems. 
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Lemma 4.1: Let iY^} be a sequence ot* independent identically distributed 
real random variables, then 

It' 3 ' 0, A f), N such that for all t : N, \\\^ > x) > .s, 

then Mm Y a.s. does not exist, 
n^ t«l ^ 

b. If 3 t < 0, 6 > 0, N such that for all t > N, P(Y^ < c) > 6, 
n 

then lim T. Y. a.s. does not exist, 
n^. t«l t 

Proof: (a) from the hypothesis ?. P(Y^ > c) « Since the Y are 

t*l ^ t 

independent identically distributed real random variables (iidrrv), the 

Borcl-Cantelli theorem tells us that with probability 1, Y^ > e infinitely 

often. Therefore the terras of the sequence {Y^} do not go to zero and 
n 

rho lim V V a.s. does not exist. The proof of part b is similar. 

n*"'' t=l *■ 

Theorem 4.2: There are no strategies in a pure problem basis which are 
asymptotically optimal according to definition A. 

Proof: Let Y = (..*-u ). Tlien E[Y.3 « u*-P^ > 0, VAR[Y^3 « VAR[m ] > 
;md the Y^^^ are iidrrv. Therefore, there is an e, 6 and an N satisfying 
part 1 of Lemma 4.1 and the theorem follows immediately. 

meorem 4.3: There are no strategics in a pure problem basis which are 
asymptotically optimal according to definition B. 

I'roof: Let Y = (v*-u ). By definition of v* Y^ > 0. Since 

the Y^ are iidrrv and VAR[ ; ] = 6 > 0, VAR[Y ] » (S{ > 0 and we 

n 

can agajn apply part (a) of the lemma to show that lim Z S .a.s. 

t«l 

dues not exist. In fact, since all of the terms in the sequence are 



n 



non- negative, lira i. Y <= a.s 
n^. t=l s,t 



rhesc results arc intuitive and essentially just illustrate that the 
''s t independent samples of the real random variable ^ , 
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Tlieoreras 4,2 and 4,3 show that th«? concept of asymptotically optimal 
strategies in a pure problem basis as defined by A and B is vacuous. 

We now consider using the sample mean as the payoff of a strategy 
as suggested in Holland (1970). This may be done in two ways. We 
can require that the algorithm perform the calculati^^n of the sample 
mean. The algorithm then has the additional task of storing the necessary 
information to calculate the sample mean for each strategy. Alternatively 
we could change to a variable problem basis where the calculation of 
the sample mean is made outside the algorithm in the evaluation of the 
function payoff. The algorithm would treat the sample mean as the 
"payoff" for the strategy. The exact inylementation is not critical 
to the question under investigation: Are there asymptotically optimal 



mean for strategy s? We will foxmulate a new version of the SR algorithm 
which calculates the san^le mean and uses this value to calculate the 
probability vector over the strategy set. 

Definition 4.4: An SR3A (SR3B) is an SRP with the following changes: 
PAYOFF returns the value u. for v, y* for v*. In step 3.7 of the 
definition of an SRP (Def. 2,1?), if j is 0 set > . ^ = 0. Step 3.10 
of definition 2.10 is replaced by: 

3.10 Calculate: 



strategies under Dt ..initions A or B when y 



is replaced by the sample 



1. 



j ,w(.i ,t3 



^j,wcj,t).l.t«Cj'^^l)*^ 

ITITItl ^ 



2. 



for 1 < i : X(t+1) 




s 



V 



BEST copy AVAILABLF 



Wo define SR3H as above except step 3.10.2 is: 



\i.w(J,t) i « j 



maxCv 



The algorithms SR3A and SR3B suggest the following definitions for 

asymptotically optimal strategies: 

C. A strategy s e in a pure problem basis <A,u»S> is 

n 

asyno^^totically optimal if j lim S (y*-X ) J < a.s. 

i^w t«l 



A strategy s f: S in a pure problem basis <A,u,S> is 
asynq>tctically optimal if jlim Z (v^-X .) < a.s. 

tsl t S A t 

I. * 



where v* = max(y*,X J) . 
t s,t 

We now show that definition C cannot be satisfied by any strategy in a 
pure problem basis. 

Lemma 4.5: Let •A,y,S> be a pure problem basis. For those strategies 
s € S such that i m*i lim ^^^Cu*-^^ ^) a.s. does not exist. 

Proof: E[vi*-X A - u*-p > 0. Choose e so that u*-P^ > c > 0, 
^ s^t** s s 

then P((y*-X ^) ^ e) £ 5 for 6 > 0 and Lemma 4.1 a« shows that 
s , r 

n 

lim >: (u*-x ) a.s. does not exist. 
t-1 

Lemma 4.5 shows that strategies which do not obtain the maximum mean 
cannot be asymptotically optimal. The next lemma shows that even if 
strategies do obtain the maximal mean they cannot by asymptotically 
optimal by definition C. 

Lemma 4.6: Let u ^ ,U2# • • • t u^, • . . be independent identically distributed 
real random variables with nonzero variance. 
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t 

Let i:[u^] « u*, \^ = — ^ — , Then it is not the case that 
lUra .V (u*-\^)) • - with probabilitv I, 

Proof: Let be the nth partial sum. Then rearranging tcrnw 
Y- « I 1/i * ... ♦ (u*-y ) 1/n. 

Let X = iu*-u.) .? 1/i ; Z„ = Y„ - X„. 
" 1 iaj n n n 

^n \ independent random variables since the are independent. 

Since .1 1/i does not converge and the variance of m is nonzero, for 
any M there is an , and 5 ^ , such that for n > 

n 1 
P(X^ < -2M) > 6^ where 26^ < 1. 

suppose that the lin Y^ = N < with probability 1. Then for e = 1/36,, 
there is an N2 such that for n > 

P(|Y„i > M) < e =(1/3) <5j 

Let N = maxCNj,N2). There are two mutually exclusive events (not 
exhaustive) that give lY^^j > M namely, {X^^^ > 2M and Z^^^ > -M) and 

< -2M and < M}. 
Therefore we know that 

P(X^ > 2M, > -M) <( 1/3)6^ 

P(Z^ > -M) <(l/3)<Sj/P(Xjjj > 2M) < 1/3 

P(Zj^ < -M) > 2/3 

Similarly: 

P(2^ > *M) > 2/3 
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Since {2^ < -M) and u^^ > ♦M} are exclusive events, this is impossible. 

- n 
mcrefore the assumption about the convergence of lim S (u*-^ ) is 

t«l t 



t'^lse an J the theorem results. 



Diecrem 4.7: There are no strategies in a pure problem basis which are 
asymptotically optimal according to definition C. 

Proof: The proof follo\;s immediately from Lemma 4.5 and 4.6. 

We would like an even stron<: x result than Theorem 4.7. We would 
like to be able to say that the li.rt't in definition C is nonconvergent 
with probability 1 even for strategies with optimal mean. Similarly, 
we would like to be able to say that the limit in definition D is 
nonconvergent with probability 1. The following two theorems are the 
result of collaboration with B. Koopmann, C. Quails, P. Pathok. The proofs 
are contained in Appendix A. 

Theorem 4.8: Let \u^) be a sequence of independent identically distributed 
random variables with nonzero variance and finite second moments, let 



n 
t 



^[►■^] = '^*f then lim I (^J-^^^ " 

n-»<io t ~ 1 



" ^i , V* « max(M*,A^). 

^ 1=1 — ^ ^ 

Theorem 4.9: Let (u^) be a sequence of independent identically distributed 

nindoni variables with nor.iero variance and let E(y^] » u* , then one of 

the following three conditions hold: 

S S 

1) ♦<» a.s, 11} L -* -<» a.s, or 

n n 

s^^ s 

^^^^ IT * a.s., lim /: ~ -«> a.s. In all three cases 

^n ^ 
-r- diverges a.s. where S = t u^, 
n n t=l ^ 



3S 

Wf can now obtain the theorem we noeU concerning liefinition 1>: 

nieorem 4,10: Diere are no stratosies \i\ a pure prohletn basi.s which 
are asymptotically optimal accordini» to definition P. 
Proof: Theorem 4.8 shows that strategies having optimal means (p = \*) 
cannot be asymptotically optimal according to definition D. Let s be 
a strategy with < y*, then 

'I " ^s,t - K - ^s,t 
where vj « inax(u*,Xg^^) , * maxip^.X^ ^) . 

n n 
Therefore lim T, (v*-X^ ^) > lim t (o^-X J a.s. 

t=l t s,t t=l t s,t 

Thus no strategies can be asymptotically optimal according to D. 

We have shown that using knowledge of the true means of the distri- 
bution of the does not provide a definition for asymptotic optiraality. 
However, we could use an estimate of the means as a value for m*. 
Let f be a function of the sample means, we want to know iff caii be 
defined so that for strategies t^ with « y* 

and for strategies with < u* 

Wc have been exploring this problem but have as yet not Jevelopcd a 
function satisfying these conditions. 

In order to examine the convergence of SRP algorithms over pure 
problem bases, we conclude that we must employ other means than we 
used for deterministic bases. The next section introduces the notion 
of Linear Additive Models, and in Section III of this chapter we show that 
in most cases the SRP procedures do i;ot converge in a pure problem bases. 
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Section II: Tlie Linear Additive Model 

In this section wo will surnanarize the development of the Linear 
Additive Model, a generalization of the Beta Model, given by Norman 
[1970]. It should be noted that Norman's results are only for the case 
is! « 2. This may appear to be a very limited analysis. However, all of 
the major problems of convergence can be found in this ''two dimensional" 
case. In the psychological interpretation of the model, one is usually 
interested in the two choice situation. Lamperti [1960] has obtained some 
results for the Beta Model for the general case. Several of the results 
presented in this section will be used later on. In Section III we 
relate Norman's assumptions to the SRP plans we have defined. 
Assumptions (Norman): 

4.11 Qj, i = 1,2, is a probability distribution over the Borel 
subsets of R, such that M^C;) « e^^dQ^(x) exists for 
C in some open interval J containing 0, 

4.12 p is a measurable mapping of R into I « [0,1], such that 
p(L) -> 1 as L p(L) 0 as L -* -00^ and p is bounded 
away from 0 and 1 on any finite interval. Let q » 1-p. 

4.13 = CL^,X^), t > 1, is a bivaiiate stochastic process in 
R < {1,2}, such that 

r(X^ = l|L^,T^^^,...,Tj) « p^ (p^ . p(L^)), 
P(X^ « 2lL^,T^_j,...,Tp « (q^ « q(L^)). 

and for any B 

PiM^ £ Bi'^t*'^t-r--"'^P % almost surely. 
The means of the conditional distributions of are very import^t. 
In fact, their valuus determine the convergence or divergence of the 
sequence L . Let m. = C xdQ. (x) , then m. » M.'CO), for i « 1,2. 
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In order for the sci^ucnce to converge, must be absorbed at either 

or • Lemmas 4.14 and 4.15 state conditions for absorption and 
reflection if the iiu are nonzero and are constant. Lemma 4.16 removes 
the possibility for convergence of L^ to some finite lim-.£, with nonzero 
m.. 

Lemma 4.14: (Norman) a. If < 0, lim inf " a.s. For any 
iS • 0 such that < 1, there is a constant B(6) such that 

lim sup ElCv;^] < B(6). 

b. If m-, > 0, lim sup > a.s. For any 6 < 0, such that M2(<S) < 1, 
there is a constant B((S) such that lim sup E, [v. ] < B(6). 

Lemma 4.15: (Norman) a. If m^ > 0, then lim sup L^ « implies 

lim L^ ^ ^ a.s., and KqCL) = Pj^(lim L^^ 0 as L 

bp If < 0, then lim inf L^ ^ implies lim L^ = a.s. 

and gj(L) « Pj^(lim L^ « ^) 0 as L 

Lemma 4. 16: (Norman) a. If m^ > 0 or m2 > 0, P(lim sirp L^ e R) - 0. 
b. If m^ < 0 or m2 < 0, P(lim inf L^ c R) « 0, 

With these lemmas, four possibilities for the asymptotic behavior 
of p^ can be distinguished provided that the m^ are nonzero and that 
t)ie are not dependent upon t. Either p^ converges to 1 a.s., converges 
to 0 a.s., does not converge to any limit, or has a probability r, of 
converging to 0 and a probability 1-^ of converging to 1. 

Theorem 4.17. (Norman) 

a. If m, > 0 and m^ > 0, lim p. « 1 a.s. 

1 Mr L 

^. If nij <- 0 :ind tn^ • 0 , 1 im = 0 a. s . 

c. if - 0 and • 0, * g^CL) = 1, where g^CL) = Pj^lira p^ = 1] 
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g2{L^ * P^ilim = In addition g^CL) > 0, 

82(^5 ^ 0, gj(L) - 1 as L ♦ * and g^H) 1 as L 
a. It' iDj . 0 aiul ra, «• 0, lim sup = i and lim inf p^ « 0 a. 

These results are independent of the initial value of L, . 
Variations of this theorem for the Beta Model can be found in 
Lamport i and Suppes [I960]. 

Two relations are developed by Norwan which will prove useful 
later: Lot = e^*, then 

Equation 4. 18: 
Equation 4.19: 



3i> 



Section 111: Linoar Models anU Pure Problem Bases 

Now we relate Norman's assumptions to the SRI algorithm where 
I SI a 2. Let be the time at which the second strategy is selected. 
This time is finite with probability one. 
Let 



*f t«l ^ 



Then from 2„12 and 2,13 we obtain: 
Equation 4,20 

for t < T- ^ = 1-2"^^*^^ 



for t > T^ 



^2,t " ^"Vi^) = <l(V • 

For t > T^, assumption 4.12 of Norman is certainly satisfied by 
equation 4.20. Let AL^ * ^t>i " ^t % t ^® probability 
distribution of u. . If strategy i is used at time t we use the notation 



• l. Let 
I .t 



equation 4.21. M (r.) =( e^'^^i,t dQ. ^(w, J, 1,2. 

i»t i,t 

Since '^j * 1 the assumption that the moment generating 

functions of the u(a) exist (definitior 2.1) M. (rj exists for C 

1 ft 

in some open interval J containing 0. Therefore, assumption 4.11 is 

satisfied if wc replace Q. with Q. . If the set S of strategics is a 

1 , t 1 

set of pure strategies then we can remove the dependence on the time 
variable and satisfy assumption 4.11, Assumption 4.13 follows from the 
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definitions of P.^^, and in a straight forward manner. 

Unfortunately, SttP plans do not fall si^ly into the categories 
for converKcnce outlined by Nomuin. Tlicroforo, we must first extend 
his results to include the cases where the m^ might be 0. 

Lemma 4.22. If m^ » 0, > 0, or m^ = 0, > 0, then 

(a) P(lim sup L^e R) « 0. (b) P(Um inf L^e R) « 0. 

Proof: Norman's proof of Umma 4.16 relies on the observation that if 

m^ > 0, then Q^([ 2e,»)) > 0 for some e > 0 and Q^([2e,-«)) > 0 for 

some f < 0. This same observation holds if m^ « 0 and we know that 

0^ > 0. Once we have made this observation, Norman's proof applies to 

the present lemma. 

Lemma 4.23. a) if m^ « 0, 0 then lim L^ ^ 

b) if m^ - 0, a, 0 then lim L^ ^ 

Proof : 

a) Suppose lim L^ = +<»a.s. Since m^ « 0, we can chose a X < 0 and an 

t • 0 such that Mj(X) > 1+e. Then 

1 im X L » -<» , 

Urn e^^ = 0 . 

t->«> 

and 

Urn [M (>) p + M.(X) q J « M,(X) 

A t <t X 1 

Given c we can find N such that for all t > N 

o 0 

Mj(X) p^ ♦ M2(X) > Mj(X) »e 

by liquation 4. 19 

Wm life^^t*!^ , ^ .c)*-Vl k > 0. 

t ' • t-^ 
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But Urn lifc^'^t + i] a 0 so K'lim (Mj(\) -0^"V^ <j 0 
By mir choice ot* \ and r, MjCx) is a constant greater than 1. 
Since K is positive, the limit cannot he less than or ot^ual to 0. 
Hiorcfore wo have a contradiction and our original assumption that 
' " is incorrect. The proof of b) is similar. 

Theorem 4.24 

a) If njj « 0, > 0, > 0 then Urn sup p « i and lim inf p « 0. 

h) If nij < 0, ^ 0, > 0 then lim sup p^ « 1 and lim inf p^ = 0. 

c) If m « 0, m, « 0, 0 > 0 then lim sup p^ = 1 and lim inf n 0. 

.1) If in. « 0, nj^ < 0, a. > 0 then lim p^ « 0 a.s. 
i » 1 t 

e) If fflj > 0, 1112 = 0, ^i * ^ ' 1 a.s. 

Proof: 

a) From lemma 4.14b, lim sup > -»a.s., lemma 4.22a shows that 
p(lim sup L^eR) = 0. Therefore lim sup « +»a.s, and by 4.12, 

lira sun n =1, 
t 

From Lemma 4.22b P(lira inf L eR) « 0. If lim inf L - ♦coa.s., 
then since lim sup L^ « ♦^a.s., lim L « +^.s. But since o / 0 
this contradicts Lemma 2.23a. Therefoxe lim inf L^ = -«>a.s, and by 4.12 
lim inf p^ = 0. 

b) prcof is similar to a. 

c) By lemma 4.22, }'(lim sup L^eR) = 0, P(lim inf L^eR) « 0. By lemma 4.23 
lim L^ and Urn L^ ^ Therefore, lim sup L^ ^ lim inf L^ and 

lira sup L^ = +^a.s., lim inf L^ » - oa.s. and the theorem follows. 

d) By lemma 4.22, lim inf I /. U a.s. and lim sup L^ ^ R a.s. If 

lira inf L^ ^ ♦ ' then lim sup L^ = ♦« and lira L^ = +*. But this contradicts 
lemma 4.23a. Therefore lim inf L. « and by lemma 4.15b, and m < 0. 
lim L » --a.s., and by assumption 4.12 lim p » 0 a.s. 



e) »y Icimno A. 22, lim int* / R a.*. a»a lim sup / R 
tf Uw jtup « . . thwHi Um int* • and lim « But thlj* 
contradicts Ifnana 4.23b. Therefore lim sup « ♦*> and by Icinma 4.15o 
««ftd ro^ > 0, lira a ♦ a«8. and by assuraption 4.12, Um « 1 a.s. 

The intuitive character of the distinction between conditions d 
and c of the theorem and the other three conditions should be clear. 
If raj > 0 then probability one of selection of strategy 1 is an absorbing 
barrier. If 4 0 then probability one of selection of strategy I is 
a reflecting barrier. Similarly if m^ < 0 then probability zero of 
selection of strategy 1 is an absorbing barrier. If m2 > 0 then probability 
zero of selection of strategy 1 is a reflecting barrier. 

Definition 4. 25: A function f(x) is etriotly oomave if the tangent 
to f(x) at any x lies above the graph of f. That is, f»' < 0. 

Lemma 4.26: If f is a real, continuous and strictly concave function 
defined on an interval Ca,b) of the real line, X a real random variable 
defined on (a,b) , then 

E[f(X)3 4 fCE[X3). 

Proof : 

Since f is concave: 

f(x) f(E[X]) * f»(E[X3)(x-E[X]) 

fb ^b 

) f(x)dP < C (f(E[X]) * f(E[X])(x-E[X]))dP 

«^a -/a 

E[f(X)3 <f{E[X]). 

Corollary 4.27: The natural logarithm is a concave function and therefore 
n(ln(X)3 4 ln(E[X]). 
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.W viv consiaor the convergence of SRl a aiiU b plans. Let 
;S| : 2 an! suppose that stratogy 1 is such thiit ♦•j « (the case 
^ ' * i-; "^vnuik^tric). TIumi we first neeJ to calculate the values of 
ittj and m, for an SRla. 

applying 4.27, 

« tlM^ ] « E[ln ir- ] 

^ft 1 

applying 4.27, 

ln(p**kj) - InCp^^kj) 

and since p* > 

m, > 0 . 

Wc now have the conditions necessary to apply theorems 4.17d, 4.24a and 
b and obtain: 

I'heorom 4.28: An SRla plan over a pure problem basis <A,y,S> with 
jS; = 2 will not converge to the strategy with higher mean. In fact. 
Urn sup Pj ^ * 1 and lira inf ^ « 0. 

Let cs^ = raaxC.*,Uj where j is the strategy used at time t. 
Ilion for the SRlb plan we have 

vt+k 

' IUln(v^.*kj)3 - I;[ln(u2^j*kj)l 
by 4.27 J E[ln(v^*kj)] - ln(P2+kj) 
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since • o^, ln{0^»kj) ^ ln(».s*kj) and E(ln(C^*kp] > InU^^kp 
therefore, 

1 

^ ln(p»*kj) - E(ln(C^*kp] 

since » o% ln(0^*kp > ln(p**kj). and E[ln(C^*kj)3 ^ ln(p**kj) 
therefore. 

Again applying theorem 4.17d, 4.24a and b we obtain: 

Tlteorcm 4.29: An SRlb plan over a pure problem basis <A,y,S> with 

,'S; = 2 will not converge to the strategy with the higher mean. In fact 

lini sup P, = 1 and lijn inf P, ^ « 0. 

We now investigate the convergence properties of SR3 A and B 
plans with jSj = 2. Again we assume that Pj » p* (the case = o* 

is symmetric). First, we calculate the values of the m. for SR3A: 

^ k 

^,t = ^f^4.t3 » ^nn^^-^] 

= li(ln(^j^^+kj)] - E(ln(p**kp] 
c ln(E[Aj^^*kj3) - ln(p**kp 

by 4.27 £ 0 . 

I'or ni, we find: 



ERIC 



45 

»> ^. > 0 . 

By similar calculations for the SR3B plans with 0 a maj<(p*,X. ) 

t j »t 

whc-re j is the strategy used at time t, we find that the same relations 
hold and m, , 0 while la- ^ 5 0. 

Lemma 4.30: If jn^^^ a 0, ro^^^ ^ 0 for all t, a^^^ > 0 and ^ > 0 
then Urn sup p « 1, and lim Inf p « 0. 

r roof: the proof is similar to that for theorem 4.24 md relies on 

4.24 part c. 

Now we have shown the following: 

Theorem 4,31: SR3A and SR3B plans over a pure problem basis <A,u ,S> 
with |s! R 2 will not converge to the strategy with the higher Jnean. 
In fact lim sup P. « 1 and lira inf P, " 0. 



CHAPTER 5 
CONCLUSIONS 



In this work we have examined the convergence properties of 
a class of probabilistic sequential adaptive schemes called sequen- 
tial reproductive plans, type I (SRP). Theorem 3.21 shows that a 
subclass, the SRI plan over a finite restricted deterministic arena, 
converges with probability 1 to a set of "good" strategies. However 
Theorem 3,16 shows that these plans do not converge fast enough to 
achieve the finite loss claimed in Holland [1970]. 

'A'c have shown in Chapter 4 (Theorems 4.2, 4.3, 4.7, and 4.10) 
that there is not an intuitive analogue in pure problem bases for 
the notions of optimal strategy and arena which were defined for 
deterministic bases. In fact, by extending results in mathematical 
psychology, we have shown that several large subclasses of SRP plans 
do not converge in pure problem bases. (Theorems 4.28, 4.29, 4.31) 

Because of the convergence problems of SRP type I plans, we 
sui;gest that they are not adequate models of the duplication process 
in genetic adaptation. Two further plans have been studied as a 
result of these findings. A plan which we call SRP Type II, has 
been developed by Holland and Moler [unpublished] which uses the 
concept of an arena as a foundation for a non probabilistic, block 
structured sampling scheme. This plan does overcome many of the 
difficulties of SRP Type I plans. However, since it relies on the 
concept of an arena, there are convergence problems in non- determin- 
istic bases. Holland [1973] has examined the convergence of a much 
simpler implementation of the duplication process. 

Extensions of this work may be made in several directions. The 
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results wo have obtainoU on Linear AiWitivc rooUels could be oxtcnUed 
to n dimensions. Further models of the duplication operator can be 
studied and incorporated in a detailed theoretical study of the other 
genetic operators in the artificial adaptive framework. We are 
presently working on a theoretical compariscm of artificial genetic 
techniques with numerical analysis techniques in n dimensional function 
maximization. 



APPENDIX A* 

Theorem: Let X^, ^2'"' * sequence of independent identically 
distributed real random variables, not identically zero. Then 

Sn 

n diverges a.s. 

n=i 

and a certain trichotomy holds. 

Proof: 1. Suppose first that P(Xj^ > 0) > 0 and P(Xj < 0) > 0. 

m S m 
Writing ^ « Xj .Ij (l/i)+ Y^, we note that 

TO m 

[Urn < », X^ (1/i) -> -00] c (S^/n) - -«.] 

ni jn 
[Urn > CO, (1/i) c [^r^ (S^/n) ^ *»] 



lilH = ii^n * Xj .Sj (1/i) -»] c [ lira ^S^ (S^n) - -»] 
Iil2LY^ lira = Xj .Sj (1/i) *«>] c [ lira (S^/n) *->3 

2. By the Hewitt-Savage 0-1 law, the four events on the right hand 
side of the above implications have 0 or 1 probabilities. Now 

PC^j ill (1/i) ■* -'") > 0 and P(Xj ^Z^ (1/i) ^ > 0 and at least 

one of the following is true: P(Tinr Y < «>) > 0, P(liro Y <-<»)> 0 

or P(liiu Y = Tim Y « > 0, 

Consequently we have the following trichotomy: 

in S 

1} i. . — -'>-'a»s • 



n=l n 

n +»a,s. 



m g 



11) 

n=l n 

* See Koopmans et^ al_' [to be published]. 
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Writing -~ « V Jl . ) Jl , we see that in case i) v Jl » v 
m case ii) T. ^ - and in case iii) X ~SL « and 5: -il « «. 

3. If P(X A 0) n 1, then case i) with r « - E-il « -*a.s. and E-^- » 0 

n n n 

If P(X 0) « 1, then case ii) with „ £ J2. « .^a.s. and Z-^ b q 

n n n 

S 

In all three cases, is divergent a.s. 

Theorem: 

X^,X2,... be i.i.d. with E[Xjj « 0, and 0 < < *. 
Let = Xj*...*X^, s* « max (0,S^) and S- « max (0,-S^). 

Then r « ♦«'a.s. and S — « -ooa.s. 
i n ^ n 

Proof: From the previous theorem it follows that either 

00 S* 

j; «_ a 00 a.s, 

1 " 

or 

^ ^ a.s. 

1 " 

- S* CO S- 

Suppose that ^: — = '^a.s. and Y. -2. does not diverge. 
J n ^ n 

- S 

Then Z — - = a.s, 
I " 

S 

/. PC **) = 1 

N S N , 
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Consider T.. ^ 

^ ^ \ a • Y 



kal * 

It 15 easy to see that 

^i,N ^ ^2,N ^ - 
and lim d. ^, = 0 

and /, V 1 ^ k < N , N ^ 1} are uniformly small random 

variables* 

Let Fj^ ^ be the distribution function of Y^^ ^. Then by the Lindeberg- Feller 
criterion: 

T 

N 



C, 2 is as>'Tnptoticallv normal with zero mean and 

k-1 ^'^ 

unit variance, 

if 



Ij.m ^ y^dF. ^ = 0 



But 



^d^ ^, dF^ , where F^^ is the d.f. of 



k*^ 1 



(X) 
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0 as N-H» since lira d, ^ « 0 . 



So the central Umit theorem holds and hence 



i.e. liro P(T„ > 0] « V2 

but this contradicts what we had before, namelyj 

«*> S 

lim \ « E -il « a.s. 

Therefore, the assungjtion that I does not diverge is false. 
Hence both 

S- S* 
z -p and Z — diverge a.s. 
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